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Abstract. We solve a problem, stated in [CGPIO], showing that Sticky 
Datalog^, defined in the cited paper as an element of the Datalog^ 
project, has the finite controllability property. In order to do that, we 
develop a technique, which we believe can have further applications, of 
approximating Chase{D,T), for a database instance D and some sets of 
tuple generating dependencies T, by an infinite sequence of finite struc- 
tures, all of them being models of T. 



1 Introduction 



Tuple generating dependencies (TGDs), recently also known as Datalog 
rules, are studied in various areas, from database theory to descrip- 
tion logics and in various contexts. The context we are interested in 
here, is computing certain answers to queries in the situation when 
some semantical information about the database is known (in the 
form of database dependencies, TGDs), but the knowledge of the 
database facts is limited. 

Let us remind the reader that a TGD is a formula of the form 
Vx {${x) =^ 3y Q{y, y)) where ^ is a conjunctive query (CQ), Q is a 
relation symbol, x, y are tuples of variables and y ^ x. The universal 
quantifier in front of the formula is usually omitted. 

For a set T of TGDs and a database instance D we denote by 
Chase{T, D) the least fixpoint of the chase operation: if the body 
(the left hand side) of some TGD is satisfied in the current database, 
and the head (the right hand side) is not satisfied, then add to the 
database a new constant being the free witness for existential formula 
in the head. By Chase^{T, D) we mean the structure being the i-th 
stage of the fixpoint procedure (so that Chase^iT, D) = D). Clearly, 
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we have Chase{T, D) \= D, T, but there is no reason to think that 
Chase' {T, D) \= T for any i e M. 

Since Chase{T, D) is a "free structure", it is very easy to see 
that for any query <P (being a union of positive conjunctive queries, 
or UCQ, all queries we consider in this paper are positive) D, T |= <^ 
(which reads as "<P is certainly true in D, in presence of 7^'), if and 
only if Chase |= 

It is easy to see that query answering in presence of TGDs is unde- 
cidable. As usually in such situations many sorts of syntactic restric- 
tions on the dependencies are considered, which imply decidability 
keeping as much expressive power as possible. Recent new interest in 
such restricted logics comes from the Datalog''^ project, led by Georg 
Gottlob, whose aim is translating important concepts and proof tech- 
niques from database theory to description logics and bridging an 
apparent gap in expressive power between database query languages 
and description logics (DLs) as ontology languages, extending the 
well-known Datalog language in order to embed DLs [CGT09]. 

From the point of view of Datalog^ and of this paper, the inter- 
esting logics are: 

Linear Datalog^ programs. They consist of TGDs which, as the 
body, have a single atomic formula, and this formula is joinless - each 
variable in the body occurs there only once. The Joinless Logic we 
consider in this paper is a generalization of Linear Datalog^, in the 
sense that we no longer restrict the body of the rule to be a single 
atom, but we still demand that each variable occurs in the body 
only once. Let us note that allowing variable repetitions in the heads 
does not change the Finite Controllability status of a program, as we 
can always remember the equalities as part of the relation name, so 
we w.l.o.g. assume that such repetitions are not allowed in Joinless 
Logic (see the last paragraph of Section [3] for slightly more about 
this issue). 

Guarded Datalog^ is an extension of Linear Datalog^. A TGD 
is guarded if it has an atom, in the body, containing all the vari- 
ables that occur anywhere else in the body. Clearly, Linear Datalog^ 
programs are guarded, as they only have one atom in the body. 



^ We will write just Chase instead of Chase{T, D) when the context is clear. 
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Sticky Datalog^ is a logic introduced in [CGPIO] and then ex- 
tended in [CGP10+/-] as Sticky- Join Datalog^. A set T of TGDs is 
sticky, if some positions in the relations occurring in the rules can 
be marked as "immortal" in such a way that the following conditions 
are satisfied: 

— If some variable occurs in an immortal position in the body of 
a rule form T then the same variable must occur in immortal 
position in the head of the same rule. 

— If some variable occurs more than once in the body of a rule form 
T then this variable must occur in immortal position in the head 
of the same rule. 

Let us remark here, that the above property, that we use as a defi- 
nition of Sticky Datalog^, is actually called "the sticky-join property" 
in [CGP12], and is a consequence of slightly more complicated defi- 
nitions of both Sticky Datalog^ in [CGPIO] and Sticky-Join Datalog^ 
in [CGP10+/-] (see Theorem 4.3 in [CGP12]). This means that The- 
orem 1 of our paper holds both for Sticky Datalog^ and Sticky- Join 
Datalog^. Actually, the difference between the two logics can only be 
seen if repeated variables in the heads of the rules are allowed and, 
as we said before, from the point of view of Finite Controllability we 
can disallow them w.l.o.g.. 

Apart from decidability, the properties of such logics which are 
considered desirable are: 

Bounded Derivation Depth property (BDD). A set T of 
TGDs has the bounded derivation depth property if for each query \^ 
(the queries we are interested in are UCQs) , there is a constant e 

N, such that for each database instance D if Chase{T, D) then 
Chase^'^{T, D) \= ^. The BDD property turns out to be equivalent 
to first order rewritability [CGT09]: T has the BDD property if and 
only if for each UCQ ^ there exist a UCQ such that for each 
database instance D it holds that Chase{D, T) \^ ^ Si and only if 

Finite Controllability (FC). A set T of TGDs has the finite 
controllability property if for each query ^ such that Chase{T, D) ^ 
-1^ there exists a finite structure M such that M \= T,D, -i^. 
A logic is said to have property P e {BDD, FC} if each T in this 
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logic has it. 

It is usually very easy to see whether a logic has the BDD prop- 
erty. And it is usually very hard to see whether it has the FC prop- 
erty. 

The query answering problem for Linear Datalog^ (or rather for 
Inclusion Dependencies, which happens to be the same notion as Lin- 
ear Datalog^) was shown to be decidable (and PSPACE-complete) 
in [JK84]. The problem which was left open in [JK84] was finite 
controllability - since we mainly consider finite databases, we are 
not quite happy with the answer that "yes, there exists a database 
D, such that D \= T-,D, -i^" if all counterexamples D for ^ we 
can produce are infinite. This problem was solved by Rosati [R06], 
who proved, by a complicated argument, that IDs (Linear Datalog^) 
have the finite controllability property. His result was improved in 
[BGOlO] where finite controllability is shown for Guarded Datalog^. 

Sticky Datalog^ was introduced in [CGPIO], where it was also 
shown to have the BDD property and where the question of the FC 
property of this logic was stated as an open problem. The argument, 
given in [CGPIO], motivating the study of Sticky Datalog^ is that 
it can express assertions having compositions of roles in the body, 
which are inherently non-guarded. Sticky sets of TGDs can express 
constraints and rules involving joins. We are convinced that the over- 
whelming number of real-life situations involving such constraints can 
he effectively modeled by sticky sets of TGDs. Of course, since query- 
answering with TGDs involving joins is undecidable in general, we 
somehow needed to restrict the interaction of TGDs, when joins are 
used. But we believe that the restriction imposed by stickiness is a 
very mild one. Only rather contorted TGDs that seem not to occur 
too often in real life violate it. For example, each singleton multival- 
ued dependency (MVD) is sticky, as are many realistic sets of MVDs 
[CGPIO]. 

1.1 Our contribution 

We show two finite controllability results. Probably the more impor- 
tant of them is: 

Theorem 1. Sticky Datalog^ has the finite controllability property. 
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But this is merely a corollary to a theorem that we consider the main 
technical achievement of this paper: 

Theorem 2. Joinless Logic has the finite controllability property. 

To prove Theorem |2] we propose a technique, which we think is 
quite elegant, and relies of two main ideas. One is that we carefully 
trace the relations (we call them family relations) between elements 
of Chase which are ever involved in one atom. The second idea is to 
consider an infinite sequence of equivalence relations, defined by the 
types of family relations in which the elements (and their ancestors) 
are involved, and construct an infinite sequence of models as the 
quotient structures of these equivalence relations. This leads to a 
sequence of models, that, in a sense, "converges" to the Chase. 

What concerns the Joinless Logic as such, we prefer not to make 
exaggerated claims about its importance. We see it just as a math- 
ematical tool - the Chase resulting form a Joinless theory is a 
huge structure, much more complicated than the bounded tree- width 
Chase resulting from guarded TGDs, and the ability to control it 
can give insight into chases generated by logics enjoying better prac- 
tical motivation - Theorem [1] serves here as a good example. But 
still Theorem [2] is a very strong generalization of the result of Rosati 
about Linear Datalog^, which itself was viewed as well motivated, 
while the technique we develop in order to prove it is powerful enough 
to give, as a by-product, an easier proof of the finite controllability 
result for sets of guarded TGDs [BGOlO] (see Section |9]). It also ap- 
pears that rules with Cartesian products, even joinless, can be seen 
as interesting from some sort of practical point of view, motivated by 
Description Logics (where they would be called "concept products"). 
After all, "All Elephants are Bigger than All Mice" [RKH08]. 

1.2 Open problem: 

BDD/FC conjecture 

Does BDD property imply FC? In the proof of Theorem [T] we do not 
seem to use much much more than just the fact that Sticky Datalog^ 
has the BDD property. 
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1.3 Outline of the technical part 

In Section [2] Theorem [1] is proved, as a corollary to Theorem [2l The 
proof of Theorem [21 which is the main technical contribution of this 
paper, is presented in Sections [SHHl Finally, in a very short Section 
[9l we comment on the relations between our construction and the 
FC property for guarded sets of TGDs. 

2 Prom Joinless Logic to Sticky Datalog^ 

For a sticky set of TGDs T let To be the subset of T that consists 
of all the joinless rules in T. 

A pair T, where D is a database instance, will be called weakly 
saturated ii D |= To- So if D, T is weakly saturated then each new 
element in Chase{D,T) must have some (sticky) join in its deriva- 
tion. 

Suppose now that Sticky Datalog^ does not have the FC property, 
and that some sticky set of TGDs T, of maximal predicate arity 
/ > 0, some finite database instance D and some query $ are a 
minimal counterexample for FC. When we say "minimal" here we 
mean that / is the smallest possible. By a "counterexample for FC" 
we mean that Chase{D, T) ^ ^ but M \= <P for every finite model 
M of D and T. The following two lemmas show that the above 
assumption leads to a contradiction: 

Lemma 3 The pair D,T is not weakly saturated. 

Lemma 4 There is a finite database instance M such that the pair 
M, T is weakly saturated and M, T and (P are also a counterexample 
for FC. 

Proof of Lemma [3J Suppose the pair D, T is weakly saturated. 
Let Td be the set consisting of: 

— all such dependencies 0"(T) that T G T, and a : Var{T) — )■ D is 
a partial substitution. By "partial substitution" we mean here a 
mapping that assigns constants from D to some of the variables 
from Var{T); 
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— all the atoms true in D. 

We use the notation D here also to denote the active domain of 

D 

Now the trick is that we do not see the constants any more. 
Or rather we see them, but as a part of the name of the predicate 
forming the atom - so that P{x, y), P{x, a), P{a, y), P{b, y), P{a, b) 
are now understood to be atoms of five different relations, one of 
them of arity 2, three of arity 1 and one of arity c|§. 

Clearly, there is a canonical bijection between the elements of 
Chase{T, D) \ D and Chase{TD,0)- Each relation in Chase{T, D) 
can be defined as a disjunction of finite number of relations in Chase{TD, 0) 
and thus each UCQ ^ in the language of T can be rewritten as some 
UCQ ^' in the language of Tb- And each finite model of Tb can be 
seen (after adding the constants from D and forgetting that we used 
to read the relation names in the strange way) as a finite model of 
T,D. 

Now define 7^' as the result of removing from Td all the rules in- 
volving any relation of arity /. We have ChaseiTn, 0) = Chase{T'^\ 0). 
This is because the pair D, T was weakly saturated, so each new 
atom derived by a single application of a rule from Td to facts from 
Td must have a constant from D on a position which was immortal 
in T, and - by stickiness - any atom derived later must contain this 
constant. It is an easy exercise for the reader to verify that if there 
existed a finite structure such that |= T^\^^' then a finite 
structure M such that A^ |= Tb, could be constructed. So D, T^^ 
and are a counterexample for FC, with maximal arity of relations 
being equal to / — 1. □ 

Proof of Lemma [4} Since Sticky Datalog^ enjoys the BDD prop- 
erty, we know that there exists a query !^ such that for each database 
instance F it holds that F \='^ ii and only if Chase{F, T) |= ^. 

Let D = Chase{D,To). Clearly, Chase{D,T) = Chase{D,T). 
So D ^ 

Since To is joinless, we know, from Theorem [21 that there exists 
a finite structure M such that M \= To, D, so in particular the 
pair M, T is weakly saturated. 



^ Obviously, x, y are variables and a,b £ D. 
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Since M ^ ip' we get Chase{M, T) ^ ^. It remains to show that 
for each finite structure N, if N \= M, T then N \= \P. But this 
follows directly from the assumption, as M |= Z^. □ 

3 Theorem [2] — plan of the proof and our first 
httle trick 

We are planning to construct , for given D and T, an infinite sequence 
of finite structures M„, which will "converge" to Chase, in the sense 
that the following property will be satisfied: 

Property 5 (i) \= D,T for each n G N. 

(a) For each query \I/ , and each n G N if Mn ^ \I/ then M„+i ^ \I/ . 
(Hi) For each query ^, if Chase ^ ^ then there exists n E N such 
that Mn^^. 

\1/ is meant here to be a CQ or a UCQ (union of CQs). 

Definition 6 A formula <P will he called M-true if Mn |= <P for each 
n G N. 

Suppose, that a sequence M„, satisfying Property [5](i),(ii) is con- 
structed. Then: 

Lemma 7 (first little trick) If (p is an M-true UCQ then there 
exists a disjunct of <P which is M-true. 

Proof: By Property M^ii) all conjunctive queries true in M„_,.i are 
also true in M„. Since ^ is true in each M„, some disjunct from ^ 
must be true infinitely often, and therefore in each M„. □ 
The rest of the paper is organized as follows. In Section H] family 
patterns are discussed, which constitute the body of our vehicle. In 
Section [3]the sequence M„ is defined and we present our second little 
trick, which is the main engine of the proof. In a very short Section 
[6] a trivial case of cyclic queries (whatever it means) is considered. 
In Section [7] we define a normal form of a conjunctive query and use 
our two little tricks to show a sort of normal form theorem: 

Lemma 8 For each M-true CQ there exist a CQ f3 in the normal 
form such that (*) (3 is M-true and {**) Chase \= {(3 ^ (p). 
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Then, in Section [8] we prove: 

Lemma 9 //0 is in the normal form and Mq \= cj) then Chase \= 0. 

As a corollary we get our main technical Lemma which, due to 
Lemma [71 implies Theorem [21 

Theorem 10. // a conjunctive query cf) is M-true then Chase |= (j). 

Empty database and joinless rule heads. From now on we 
will assume that D is empty. This can be done without loss of gener- 
ality - see the argument of representing queries (in the old language) 
as finite disjunctions of queries (in the new language) from the proof 
of Lemma [3l Once D is empty, all the atoms in Chase are produced 
by the rules of T, so we can rewrite the program to make sure that 
repeated variables in the heads of the rules are unnecessary because 
they are remembered as a name of a predicate. Clearly, the argument 
of representing queries as finite disjunctions would need to be used 
again here. 

We also assume (also w.l.o.g) that each rule from T is either 
of the form Q{x) =^ Q'{y), where y C x, or Qoi^o) A Qi{xi) =^ 
3y Q{y, xo,Xi). 

4 On the importance of family values 

Let / be the maximal arity of the predicates in the signature under 
consideration. 

Imagine a family of at most / members having dinner together. 
We will be interested in its family pattern - the complete information 
about the family relations between the diners. An important part of 
it is family ordering - the information about the ancestor relation 
within the family. All the families we are going to consider will be 
tree- like with this respect: 

Definition 11 By a family ordering we mean any union of ordered 
trees, whose set of vertices is {1, 2, . . .k} where k < I. If a family 
ordering is a tree then 1 is the root of this tree. 
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If a family ordering is a tree, the youngest family member is 
understood to be the root of the tree. If Alice dines with her parents, 
we have a tree with Ahce as the root, and two leaves. If Alice dines 
only with her boyfriend Bob, they form a family ordering consisting 
of two elements and no edges - this is why we need unions of trees 
rather than just trees. 

But family ordering alone is not everything we want to know 
about a family. Alice dining only with her granny form the same 
ordering as Alice dining with her mother, but they do not form the 
same family pattern: 

Definition 12 A family pattern is a pair F,6, where F is a fam- 
ily ordering and 6 is a function assigning a number, from the set 
{1, 2, . . . I}, to each pair j, i of elements of F such that i <f 3, where 
<F is the ancestor relation on F (i is an ancestor of j). 

Clearly, once I is fixed, the set of all possible family patterns is 
finite. 

If j, i are members of some family, with pattern F, 6, and i is an 
ancestor of j then the value of 6{j, i) should be understood as "how j 
addresses i". One can imagine that father or maternal grandmother 
are possible values for 5{j, i). 

We are soon going to see what the notions are good for. But first 
we need: 

A remark about notations. For any syntactic object X by 
Var{X) we will mean the set of all the variables in X. Symbol Q will 
be used to denote relation symbols. Letters P, R and T will denote 
atoms of variables. Letters A,B,C,D will denote atoms of elements 
of Chase. PP will be used for parenthood predicates and sometimes 
also for parenthood atoms. To denote elements of Chase we will use 
a, b, c, d, while i,j,k will always be positions in atoms. F, G will be 
family orderings. For an atom B — PF,s{bi,b2---bk) (where bi,b2---bk 
are constants in Chase) we define a notation B(i) — bi. The same 
applies for atoms of variables. 



Definition 13 A set of joinless TGDs T respects family patterns 
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1. Each relation R of arity k in the signature of T contains, as a 
part of its name (as a subscript) a family pattern, with the family 
ordering F having exactly k vertices. 

2. If R ^ P is a rule of T , where R = Qf,&{x), P ~ Q'^-yiv) then 
y X, and if R{i) = P{j) and R{i') — P{j') then: 

— i <p i' if and only if j <g j' 

- ifi<F i' then S{i',i) = 

By y C. X we mean that each element of the tuple y occurs in x. 

3. If R A R' ^ 3z P is a rule of T, where arity of Q is k, arity of 
Q' is k! , R = Qf,5{^) and R! — Q'pi ^i{y) then P — SG,-y{z,x,y) 
for some S^^y o,nd: 

- i <G j ^ (j = 1 A i > 1) V (i - 1 <F j - 1 A 1 < i, J < 
k + l)\/{i-k-l<F'j-k-lAk + l<i,j<k + k' + l) 

— If j — 1 and l<i<k + k' + l then 7(i, j) — i- If 1 < i, j < 
k + 1 then'yiij) = S{i-l,j -1). Ifk + l<i,j<k + k' + l 
then '~f{i,j) = S'{i — k — l,j — k — 1). 

4- The signature of T is a union of two disjoint sets: parenthood 
predicates (sometimes called PPs), which occur as the right hand 
sides of rules as in Condition (3), and projection predicates, which 
occur as the right hand sides of rules as in Condition (2). 

5. For each projection predicate Q there is a parenthood predicate Q' 
such that Q{t) ^ 3t Q'{t,i) and Q'{t,i) =^ Q{i) are rules of T : 

Let us use our running metaphor to explain what is going on: 
Condition (1) says that relation atoms should be understood as fam- 
ilies. To see the meaning of Condition (2) imagine that Alice used to 
live with her two ancestors: her father and her parental grandmother, 
whom she called "granny". Then something very sad happened, and 
now she lives only with her grandmother. Condition (2) says that the 
grandmother is still her ancestor, and Alice still calls her "granny". 

Now Condition (3) . There were two families. Now they somehow 
have a child together, so they form one family. Each of the members 
of the two families is this child's ancestor. The child learns to address 
his ancestors by their positions in the family ordering, as he sees 
the ordering at the moment of his birth. The child's birth does not 
change the way his ancestors are addressing each other (notice that 
we do not care how the x's address the |/'s and vice versa - maybe 
they do not talk to each other at all?) . 
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Conditions (4) and (5) cost nothing, and they help us to keep the 
proofs short. They are not exactly about family patterns but it was 
convenient to hide them all in one definition. 

Now notice that, without loss of generality, we can assume that 
the set of TGDs under consideration T respects family patterns. 
This can be enforced by introducing new predicate names - one new 
predicate for each old predicate and for each way of arranging its 
arguments into family pattern. As each old predicate can be now 
seen as a disjunction of new predicates, by distributivity each UCQ 
can be rewritten as an equivalent UCQ in the new context. From 
now on we will assume that T respects family patterns. 

Notice also, that due to conditions (4) and (5) each CQ can be 
always seen as a conjunction of parenthood predicates (possibly with 
additional fresh variables). From now on we will assume that only 
parenthood predicates occur in queries. 

The following Lemma is an obvious consequence of the above 
assumption: 

Lemma 14 For each element a of Chase there exists exactly one 
parenthood predicate atom A = PP{a,a) such that Chase \= A. It 
will be called the Parenthood Atom of a, and the elements of a will 
be called parents of a. 

Definition 15 For two elements a, b of Chase we will say that a 
and b are 0-equivalent (denoted a =q b) if the Parenthood Atoms of 
a and b are atoms of the same predicate. Suppose a =q b, and A and 
B are Parenthood Atoms of a and b (resp.). Then, for each i, the 
elements A{i) and B{i) will be called respective parents of a and b. 

The next lemma can be easily proved by induction on the struc- 
ture on Chase. Its second part says, using our running metaphor, 
that the way an element addresses its ancestors does not change 
during its lifetime: 

Lemma 16 If Chase |= P, for P = QF,s(b) and i <p j then P{i) 
is a parent of P{j). If R = PPG .y{a) is the parenthood atom of P{j) 
then P{i) = R{6{iJ)). 

Now we have something slightly more complicated. The following 
lemma, which is is not going to be needed before Section [SI is where 
the whole power of family patterns is used: 
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Definition 17 For a family ordering F and positions ii, ^2, • • • is 

F we define the set PYp{ii,i2, . . .is) of positions in F as f]^^i g{j '■ 
~'j <F id}- When the context is clear we will write PY instead of 
PYp. 

PY stands here for "possibly younger" - this is exactly the set 
of family members who potentially can be younger than each of 
ii,i2, ■■■is- 

Lemma 18 (About the future.) Let A = PPF,6{a,a) and sup- 
pose Chase |= A. Suppose ii, i2, ■ ■ ■ is o-f^, pairwise incomparable by 
<F, positions in F and let 6i, 62, • • • bs be equal to ^4(22), • • • ^(^s) 

respectively. Suppose di, d2, ■ ■ ■ dg is another tuple of elements of Chase 
such that bi,b2, ■ ■ ■bg =q di,d2, ■ ■ ■dg. Then there exists an atom 
C = PPp^s{c,c) , such that: 

1. Chase \= C ; 

2. di,d2, ■ ■ ■ dg equal C{ii),C{i2), ■■ -Ciig) respectively; 

3. if 3 e PYF{ii,i2, ■■■is) then =0 C{]); 

4- if i is a position in F and i <f im then A{i) and C{i) are respec- 
tive parents of A{im) andC{im)- 

The lemma says that the potential of forming atoms in Chase 
only depends on the =0 equivalence class of elements (and tuples of 
independent elements), not on the elements themselves. 

Proof. First of all notice that Claim (4) follows directly from Claim 
(2) and from Lemma [T6l 

For the proof of Claims (l)-(3) we will consider (a fragment of) 
the derivation tree of A in Chase, which we will call V: 

— Atom A is a root of V (and thus an inner node of V). Positions 
ii,i2, . . . is are marked in A. 

— Suppose an atom B = PG,j{e, e) is an inner node of V and 
Chase \= B. Suppose B' = PG/yiei) and B" = P^„_y,(e2) are 
such two atoms, true in Chase that B was derived in Chase, 
from B' and B", by a single use of the rule: X' A X" =^ 3x X, 
where X' = P^,,y(xi), X" = P^„,y,(x2) and X = PG,^ix,x) 
Then B' and B" are nodes of V. If position i was marked in B 
and X{i) = X'{j) then position j is marked in B'. Similarly, if 



14 



position i was marked in B and X{i) = X"{j) then position j is 
marked in B". The case when B was derived by a projection rule 
X' ^ X is handled analogously. 

— A node with no marked positions is a leaf, called an unmarked 
leaf. A node which is a PP atom, and whose only marked position 
is its root is a leaf, called a marked leaf. All other nodes are inner 
nodes. 

The idea here is that we trace the derivation of A back to the 
PP atoms of the elements 6j. The way we formulated it was a bit 
complicated, but we could not simply write "an atom is a leaf of V if 
it does not contain any of 61, 62? • • • &s"- This was due to the fact, that 
6's can occur in the derivation not only on important positions - the 
positions that lead to i's in A, but also on unimportant positions, 
not connected, by the rules of T, to any of the i's in A. 

Now, once we have V, consider another derivation V, with the 
underlying tree isomorphic to V, defined as follows: 

— If i? is an unmarked leaf of V then B is also the respective leaf 
of v. 

— If i? is a marked leaf of V, which means that B is the Parenthood 
Atom of some 6j, then the Parenthood Atom of di is the respective 
leaf of v. 

— If 5 is an an inner node of P, being a result of applying some 
rule from T to atoms B' and B" (or just to B') then the same 
rule is used to create the respective atom in V 

Notice that if T was not joinless, the last step would not always 
be possible. Now, the atom in the root of V is the C from the 
Lemma. □ 

5 The canonical models Mn and our second 
little trick 

In this section we first define an infinite sequence of finite models 
Mn, which will "converge" to Chase. Then Lemma [Ml which is the 
main engine of our machinery, will be proved. 
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Definition 19 — By 1-history of an element a G Chase (denoted 
as H^{a) ) we mean the set consisting all the parents of a. 

— n + 1-history of a is defined as = {x} U IJj/e-f/i(x) -^"(l/)- 

Consider an infinite well-ordered set of colors. For each natural 
number k we need to define the fc-coloring of Chase: 

Definition 20 — Then the k-coloring is the coloring of elements of 
Chase, such that each element of Chase has the smallest color 
not used in its k-history. 

— Define type^ of an element a as the k-color of this element, and 
tyP^n+i of a as a tuple consisting of the k-color of a and the tuple 
of types'^ of all the parents of a. 

Definition 21 Two elements a,b & Chase are n + 1-equivalent (de- 
noted as a =n+i b) if they are of the same type^l^l, their Parenthood 
Atoms are atoms of the same predicate, they are n- equivalent, and 
all their respective parents are n- equivalent. 

The reader should not feel too much confused by the colors and 
types. They will only be needed to deal with one trivial case. The 
real message is that "If a =n+i b then their Parenthood Atoms are 
atoms of the same predicate and all their respective parents are n- 
equivalent". 

It is easy to see that =„ is an equivalence relation of finite index. 

Definition 22 M„ = Chase/ =„. Relations on Mn are defined, in 
the natural way, as minimal relations such that the quotient mapping 
is a homomorphism. 

Clearly, the sequence M„ satisfies Property [5]J^ii). It is also very easy 
to see that it satisfies Property [5t^i) (the assumption that T is joinless 
needs to be used in the proof): 

Lemma 23 M„ |= T for each n G N (and also M„ |= D, since we 
assume D to he empty) . 

The following lemma, says that if ^ is M-true, then also some 
simpler query, which logically implies ip', is M-true. 
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Lemma 24 (second little trick) Consider an M-true conjunctive 
query \P = P /\R/\ip, where P is a Parenthood Atom of some variable 
X, R = Qf,&{w) and R{i) = x. 

Let a he a unification, which for every position j <f i in R 
identifies the variable R{j) with the variable P{S{i,j)). Then (t{\1/) 
is also M-true. 

Notice that the lemma above would be clearly true if we wrote 
it its statement (twice) "true in Chase" instead of "M-true". This 
is since each element in Chase, so in particular the value of x, has 
a unique tuple of parents. The variables in R, occupying positions 
j <F i must be interpreted as those parents, and in order to satisfy 
tp' in Chase we have no other choice but to interpret them as the 
respective parents of (the interpretation of) x. It is not that simple 
in Mn, since an element of this structure no more has a unique tuple 
of "parents". 

To prove the Lemma we first need to understand what does it 
mean for a query to be true in M„. 

Definition 25 For a CQ (j) let Occ(0) = [J^^^{{1, 2... arity{T)} x 
{T}). An n-evaluation of (p is a function f : Occ{(f)) — i- Chase as- 
signing, to each atom T from and each position i in T, an element 
f{i,T) G Chase, in such a way that: 

(*) for each pair of atoms T,T' in ifT{i) = T{i') then f{i,T) 

fit',r). 

(**) for each atom T in (f) it holds that Chase \= f{T). 

Where by f{T) we mean the atomic formula resulting from re- 
placing, in T, each T{i) (which is variable) by f{i,T) (which is an 
element of Chase). 

It is easy to see that: 

Lemma 26 M„ \= (p if and only if there exists an n-evaluation of 
0. 

Proof (of Lemma\2^: We want to show that for each natural n the 
query is true in M„. We know that \I/ is M-true, so M„_|_i |= \I/ . 

Suppose / is an n + 1-evaluation of ^ . Lemma will be proved if 
we can show that / is an n-evaluation of cr(!P'), and in order to show 
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that it is enough to prove that f{P{6{i,j)),P) =„ f{R{j),R) for 
each position j <p i in R. 

But we know that /(P(1),P) =n+i f{R{i, R)), and since / sat- 
isfies condition (**) of Definition l25t we know that f{P{6{i,j)),P) 
and f{R{j),R) are respective parents of /(P(1),P) and f{R{i,R)). 
Now use the definition of the relation =,„ to end the proof. □ 

6 Cyclic queries 
Definition 27 Let (p be a CQ. 

1. By we mean the transitive (hut not reflexive) relation such 
that for each x,y E Var{(f)) if there is an atom P = QF,s{t) in 
and positions i,j in F , such that P{i) = y, P{j) = x and i <f j, 
then X y. 

2. (j) is called acychc if is a partial orde^ on Var^cj)) (which 
means that it is antisymmetric). Otherwise it is cyclic. 

Clearly, if is cychc then Chase ^ (p. But it is also very easy to 
see that: 

Lemma 28 If (j) is a cyclic query consisting ofk atoms, then M^+i ^ 
0. So a cyclic query is never M-true. 

Proof of the lemma is left as an exercise for the reader. Hint: no- 
tice that, by Definition [20] and the first claim of LemmadSl M^+i ^ 
would imply the existence of an element of Chase having an k + 1- 
equivalent element in its k + 1-history, which is impossible, for color- 
ing reasons. This was the only place where we needed to think about 
colors. 

From now on we only consider acyclic queries. 

7 Acyclic queries and the normal form 
Definition 29 Let be an acyclic CQ. 

^ When X — ^-^ y then we think that y is smaller than x. Mnemonic hint: the arrowhead 
of — >■ looks like >. 
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1. We call a variable in Var{(f)) a master if it is a child in some 
Parenthood Atom in 0. Otherwise it is called a serf. 

2. By — J-^ we mean the direct successor suhrelation of (i.e. the 
smallest suhrelation of giving as its transitive closure). 

3. By X —^^ y we mean that x is a minimal, with respect to 
among such masters that x y. If y is a serf and x y we 
say that y is a serf of x. 

4. For an atom P = QF,5{t) let X(P) denote the "set of maximal 
master positions", that is the set of such non-root positions i in 
P that the variable P{i) is master and for each j ^ I such that 
i <F j, P(j) is a serf 

We are ready to define the normal form of a conjunctive query: 
Definition 30 A conjunctive query /3 is in the normal form if: 

1. If P is the Parenthood Atom of a master variable x, if R = QF,s{t) 
is another atom, such that R{i) = x, and if j is a position in R 
such that j <F i, then R{j) = P{6{i,j)). 

2. No one is a serf of two masters. 

3. If i E PY{I{P)) and j ^ i are positions in some atom P, then 

Tlie first of tlie conditions above is tlie one from Lemma [2^ and 
it reflects the idea, that we can restrict our attention to queries where 
variables have unique tuples of parents. The two remaining condi- 
tions are technical and will be needed in Section [HI The rest of this 
Section is devoted to the proof of Lemma |5J 

For a variable x and parenthood predicate Q let P^ be an atom 
having Q as the relation symbol, x as the child, and fresh variables 
in all remaining positions. Let be a disjunction of all possible P^ 
(one for each Q). 

Let be A ^, where <I> = Axgyar((^) Clearly, if was M-true 
then is also M-true - the only new constraint is that (the element 
being the realization of) each variable from ip was born, in one way 
or another. 

By distributivity is a UCQ. By First Little Trick we know that 
there is a disjunct in which is M-true. Call it /So- It is easy to see 
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that is a conjunction of (p and of one (call it PPx) for each 
X e Var{(j)), so Chase \= (/3o ^ 0)- 

What we did so far is that we elevated the variables being serfs in 
to masterhood. We did it because they might have been involved in 
some equalities in 0, which are beyond our control and could violate 
their serf status, as described by conditions (ii) and (iii) of Definition 

Let now /3 be a result of the following Procedure: 

P := f3o; 

while there are P and R in (5 and a position i in R, which 
violate condition (i) , unify them by Second Little Trick to 
get a new f3 . 

Clearly, this procedure terminates (as the number of variables 
decreases), and the final /3 satisfies (i). Notice that as $ contained 
PP atoms also for variables which had already been masters in 0, 
there is some redundancy in /3 - the procedure gave as (possibly 
unified) copy of as a part of 

Lemma 31 P satisfies (ii) and (iii). 

Proof of the lemma: Consider the unification step from Second 
Little Trick. Suppose the variables in R, in positions j <f i are fresh. 
Then a can be defined in such a way, that it is the identity function 
on Var{P) (so in particular, a does not identify variables in P). 

Let PP^ be the atom PP^, as it appears in /3, that is PP^ after all 
the unifications executed during the Procedure. Let y he a master, 
such that X -^j3 y. Then it is easy to see that y occurs in PP^, and 
{t:PPi{t) = y}CX{PP,^). 

Now let us go back to First we are going to rename its (fresh) 
variables. The variable in position i in the PP atom of x will now 
be called {x, i, y) if and only if it is substituted with y in /3. So 
for example the variable in the root of PPx will now be (x, l,x)0. 
Notice that we did not change the formula - we still have the same 
with fresh variables, but the complete information about /3 is already 



This is not quite true. If you notice that, you know how to correct that. 
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encoded in the names of the variables. Let F^. be the family ordering 
of atom PPx- 

Consider now the following procedure: 

do { take a — j-^ minimal master x that was not yet considered; 
for each position i G I{PP,^) 
{ let y = PP.Sit); 
modify 7 by substituting {y,l,y) for PP.j.{i); 
execute Second Little Trick for the query 7, 
atoms PPx as R, PPy as P and position z;} 
mark x as considered;} 

The new procedure performs all the unifications needed to turn 
into /3, but does it in an order defined by — t-/?, which is only known 
once we know (3. We leave it as an exercise for the reader to see that 
the 7 resulting from the above procedure indeed equals /3 (modulo 
repeated atoms). 

Another exercise is that for a serf {y,i,t) and master {x, l,x) it 
holds that {x,l,x) -^p {y,i,t) if and only ii y = x. Condition (ii) 
follows directly from that. Hint: Notice that now each atom T in 
$ first plays the role of the R from the Second Little Trick some 
number of times. The names of the variables in PY{X{T)) remain 
unchanged during this phase. Then it always plays P, while the 
respective variables in R are fresh, so, according to the observation 
we made at the beginning of this section, the variables in T are not 
being changed any more. This hint also proves that condition (iii) 
holds for 7. 

This ends the proof of Lemma ED and of Lemma [Si 
8 Proof of Lemma [9] 

In this section we show what remains to be shown: that if Mq |= i/j 
and ip is in normal form then also Chase \= ip. It will be done by 
induction on A^. 

Definition 32 Subset S C Var(ip) is a master ideal of (Var^ip), 
) ^f: 
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1. If X G S and x — y then also y G S. 

2. All maximal elements of S are master variables. 

The following definition shows that the difference between the 
fact that Mo |= ip and that Chase \= ip is quantitative rather than 
qualitative: 

Definition 33 A 0- evaluation f is faithful with respect to a master 
ideal S if for each pair of atoms R,P inip such that Var{R), Var{P) C 
S ifR{i) = P{i') then f{i,R) = f{i',P) 

If / is faithful with respect to S then for an atom R in such 
that Var{R) C and for z = R{i), we write f{z) instead of f{i, R). 

Clearly Chase \= ip if and only if there exists a 0-evaluation 
faithful with respect to Var{ip)- On the other hand, since Mq |= if), 
there exists a 0-evaluation faithful with respect to 0. We are going 
to gradually reconstruct this 0-evaluation to make it more and more 
faithful, until we get one faithful with respect to Var{il)). We will 
need the following easy remark about 0-evaluations: 

Definition 34 Suppose f is a 0-evaluation, f : Occ{iIj) Chase 
is any function, and P is an atom in ip. We say that f is P- similar 
to f if: 

- f'{i,R) = f{i,R) for each atom R ^ P, and each position i in 
R; 

- Chase h f'{P) 

- f\z,P) =o/(^,P) 

Lemma 35 If f is a 0-evaluation and f is P -similar to f then f 
is also a 0-evaluation. 

Due to an induction argument, to prove Lemma [H is remains to 
show: 

Lemma 36 Let S be a master ideal, and f a 0-evaluation faithful 
with respect to S . Let x G Var{ilj) be a minimal master variable not 
in S and P the Parenthood Atom of x in ip. Let S' be the master 
ideal generated by x and S . 

Then there exists a 0-evaluation f , P -similar to f and faithful 
with respect to S' . 
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Proof: Suppose P = PP(^p^s){x,x). We will define f'{P). Then 
we will check that the conditions from Definition [34] hold, so /' is a 
0-evaluation. Finally we will notice that /' is S"-faithful. 

Let yi, 1/2, ■■ ■ UsQ be all the master variables which are — J-^ maxi- 
mal in S and smaller than x. They all occur, in F- incomparable po- 
sitions 21,22 ■ ■ - isi in P- Let di = f{yi) and let 6, = f{i,P). Clearly, 
since / is an evaluation, we have bi =q di. Notice that possibly s > Sq, 
since multiple occurrences of y's are allowed. 

Notice also that S' \ S = {P{j) : j G PY{ii,i2 . ..is)}- This is 
thanks to Definition [20] (2). It is also going to be important for us that 
each of the new variables occurs in P only once: due to Definition 
m (3), for J, f G PY{t,,t2 ...ts) if J ^ f then P{j) ^ P{j'). 

We are now in the situation of Lemma [TS| where A = f{P). Let 
C be as in the Lemma. For any position j in (ii, 12. . . is) define 
f'{j,P) as C{j). Notice, that we can be sure (thanks to Lemma [T8|) 
that /'(j,P) =ofij,P). 

Let now j be a position in P which is not in PY{ii,i2 . . .is)- That 
means that the variable P{j) is in 5*. Define f'{j, P) as f{P{j)). The 
condition f'{j,P) =0 f{j,P) now holds trivially, since / was a 0- 
evaluation. 

We defined a function /', which satisfies the first and the third 
condition from Definition[3l] Now we need to make sure that Chase \= 
f'{P). We know that Chase |= C, so this part of proof would be fin- 
ished if we could show that f'{P) = C. Surprisingly, this is the 
crucial moment, the one we spent long pages preparing for. The full 
power of the normal form and family patterns is going to be used in 
the next 6 lines: 

Consider two positions in P: i G {21,22 ■ ■ - is} and j <p i. Let 
y G {2/1, 2/2, • • • l/so) be such that y = P{i) and let z = P{j). The 
variable 2/ is a master, so its Parenthood Atom, call it P^, is in ip. 

Since is in the normal form, we know that py{6{i,j)) = z. 
Since we defined f (j, P) to be f{z), we get /'(j, P) = fiS{i,j), P^). 
What we want to show is that /'(j, P) = C{j). But this now follows 
directly from Lemma [TBI 

In order to finish the proof of the Lemma we still need to notice 
that /' is S"-faithful. The atoms described by Definition [22] are now 
all the atoms that were already contained in S", and one new atom P. 
If P(i) was in S we defined f{j,P) as /(P(j)), so we did not spoil 
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anything. The only problem could be with the values assigned to 
positions in P with variables from S'\S. But, as we mentioned above, 
each of the new variables occurs in P only once, so the condition from 
Definition [33] is trivially satisfied . □ 



9 Remark about guarded TGDs 

The proof in Sections |3]|8] can be also read as a new proof of the FC 
property for guarded TGDs [BGOlO]. The only difference is that, 
in order to construct M„, it is not enough, in the guarded case, to 
remember by which rules last n generations of parents of an element 
were born, but also what other atoms are true about the elements. 
That is why a condition "and the n-histories of a and b are isomor- 
phic" needs to be added to Definition [21] On the other hand, all the 
family orderings we would need to consider in the guarded case are 
total orderings, which significantly simplifies everything - for exam- 
ple the ordering — J-^ of variables in a query in a normal form would 
now be a tree. 
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